Classifying Texts using Relevancy Signatures

نویسندگان

  • Ellen Riloff
  • Wendy G. Lehnert
چکیده

Text processing for complex domains such as terrorism is complicated by the difficulty of being able to reliably distinguish relevant and irrelevant texts. We have discovered a simple and effective filter, the Relevancy Signatures Algorithm, and demonstrated its performance in the domain of terrorist event descriptions. The Relevancy Signatures Algorithm is based on the natural language processing technique of selective concept extraction, and relies on text representations that reflect predictable patterns of linguistic context. This paper describes text classification experiments conducted in the domain of terrorism using the lVlUC3 text corpus. A customized dictionary of about 6,000 words provides the lexical knowledge base needed to discriminate relevant texts, and the CIRCUS sentence analyzer generates relevancy signatures as an effortless side-effect of its normal sentence analysis. Although we suspect that the training base available to us from the MUC-3 corpus may not be large enough to provide optimal training, we were nevertheless able to attain relevancy discriminations for significant levels of recall (ranging from 11% to 47%) with 100% precision in half of our test runs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Augmenting with Slot Filler Relevancy Signatures Data

Human readers can reliably identify many relevant texts merely by skimming the texts for domain-specific cues. These quick relevancy judgements require two steps: (1) recognizing an expression that is highly relevant to the given domain, e.g. "were killed" in the domain of terrorism, and (2) verifying that the context surrounding the expression is consistent with the relevancy guidelines for th...

متن کامل

Extraction-based Text Categorization: Generating Domain-specificrole Relationships Automatically

In previous work, we developed several algorithms that use information extraction techniques to achieve high-precision text categoriza-tion. The relevancy signatures algorithm classiies texts using extraction patterns, and the augmented relevancy signatures algorithm classiies texts using extraction patterns and semantic features associated with role llers (Riloo and Lehnert, 1994). These algor...

متن کامل

Information Extraction as a Basis for High-precision Text Classiication

We describe an approach to text classiication that represents a compromise between traditional word-based techniques and in-depth natural language processing. Our approach uses a natural language processing task called information extraction as a basis for high-precision text classiication. We present three algorithms that use varying amounts of extracted information to classify texts. The rele...

متن کامل

Verification of Effective Retrieval Method for Anchor Text on Navigational Retrieval

We participated in NTCIR-5 WEB Navigational Retrieval Subtask(Navi-2) in order to verify the most effective retrieval method for the index of anchor texts by using a retrieval system that indexed only anchor texts instead of full texts of Web pages. We introduced retrieval methods that combine one or more of six retrieval measures: (a) anchor frequency (af), (b) reference consistency (rc), (c) ...

متن کامل

Analysis and Generation of Emotion in Texts

This paper explores the task of automatic emotion analysis and generation in texts. We present preliminary results for the task of classifying texts by classes of emotions. Then, we present detailed experiments in classifying texts by classes of mood. We propose a novel approach that uses the hierarchy of possible moods in order to achieve better results than a standard flat classification. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992